Esta data representa el interés en diferentes tipos gustos sobre películas:
La data disponible tiene tiene estas características:
| vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Horror | 1 | 958 | 2.764092 | 1.4070597 | 3 | 2.705729 | 1.4826 | 1 | 5 | 4 | 0.1954091 | -1.2507644 | 0.0454600 |
| Thriller | 2 | 958 | 3.365344 | 1.2008920 | 4 | 3.430990 | 1.4826 | 1 | 5 | 4 | -0.3419797 | -0.8303760 | 0.0387991 |
| Comedy | 3 | 958 | 4.500000 | 0.7808443 | 5 | 4.662760 | 0.0000 | 1 | 5 | 4 | -1.6312257 | 2.4132180 | 0.0252279 |
| Romantic | 4 | 958 | 3.484342 | 1.2117783 | 4 | 3.559896 | 1.4826 | 1 | 5 | 4 | -0.3312922 | -0.8692307 | 0.0391508 |
| SciFi | 5 | 958 | 3.094990 | 1.3132060 | 3 | 3.118490 | 1.4826 | 1 | 5 | 4 | -0.0424614 | -1.1142428 | 0.0424278 |
| War | 6 | 958 | 3.132568 | 1.3453600 | 3 | 3.165365 | 1.4826 | 1 | 5 | 4 | -0.0566352 | -1.1557596 | 0.0434666 |
| Fantasy | 7 | 958 | 3.744259 | 1.1861018 | 4 | 3.858073 | 1.4826 | 1 | 5 | 4 | -0.5480089 | -0.7192661 | 0.0383212 |
| Animated | 8 | 958 | 3.771399 | 1.2321606 | 4 | 3.906250 | 1.4826 | 1 | 5 | 4 | -0.6579499 | -0.6674859 | 0.0398093 |
| Documentary | 9 | 958 | 3.633612 | 1.1348004 | 4 | 3.721354 | 1.4826 | 1 | 5 | 4 | -0.4794855 | -0.5959519 | 0.0366637 |
| Action | 10 | 958 | 3.514614 | 1.2340076 | 4 | 3.602865 | 1.4826 | 1 | 5 | 4 | -0.4085078 | -0.8815674 | 0.0398690 |
Para lograr saber si hay alguna manera aceptable de reducir la dimensionalidad de los datos, demos verificar ciertos comportamientos:
La matriz de correlación es la base de esta etapa. Observemosla:
Si hay correlaciones entre ciertas variables, hay esperanzas de una buena reducción de dimensiones.
Para ello debemos calcular en indice de Kaiser-Meyer-Olkin:
| MSA | |
|---|---|
| Horror | 0.5450566 |
| Thriller | 0.6310535 |
| Comedy | 0.5780425 |
| Romantic | 0.7132902 |
| SciFi | 0.7393117 |
| War | 0.7501253 |
| Fantasy | 0.5786192 |
| Animated | 0.5661427 |
| Documentary | 0.6126153 |
| Action | 0.6593483 |
| Global MSA | 0.6238825 |
Aqui hay dos pruebas:
## [1] FALSE
## [1] FALSE
## Parallel analysis suggests that the number of factors = 4 and the number of components = NA
Como se sugieren 4, veamos:
##
## mixed.cor is deprecated, please use mixedCor.
##
## Loadings:
## ML2 ML1 ML3 ML4
## Horror 0.994
## Thriller 0.548 0.324
## Comedy 0.162 0.778
## Romantic 0.310 -0.176 -0.314 0.450
## SciFi 0.152 0.523
## War 0.140 0.488 -0.174
## Fantasy 0.941 0.168
## Animated 0.777 0.149
## Documentary 0.182 0.315 -0.110
## Action 0.104 0.709 0.150
##
## ML2 ML1 ML3 ML4
## SS loadings 1.656 1.399 1.328 0.931
## Proportion Var 0.166 0.140 0.133 0.093
## Cumulative Var 0.166 0.305 0.438 0.531
##
## Loadings:
## ML2 ML1 ML3 ML4
## Horror 0.994
## Thriller 0.548
## Comedy 0.778
## Romantic 0.450
## SciFi 0.523
## War 0.488
## Fantasy 0.941
## Animated 0.777
## Documentary
## Action 0.709
##
## ML2 ML1 ML3 ML4
## SS loadings 1.656 1.399 1.328 0.931
## Proportion Var 0.166 0.140 0.133 0.093
## Cumulative Var 0.166 0.305 0.438 0.531
Cuando logramos que cada variable se vaya a un factor, tenemos una estructura simple.
Revisamos las comunalidades para ver cuanto han contribuido las variables al proceso en total:
## Documentary War SciFi Thriller Romantic Action
## 0.1545944 0.2891386 0.2977004 0.4096358 0.4278713 0.5422617
## Animated Comedy Fantasy Horror
## 0.6288917 0.6442086 0.9239800 0.9950000
Su opuesto es la unicidad, lo que no comparten en el proceso:
## Horror Fantasy Comedy Animated Action Romantic
## 0.00499907 0.07602016 0.35579184 0.37110948 0.45774106 0.57213198
## Thriller SciFi War Documentary
## 0.59036831 0.70230646 0.71087869 0.84542349
La complejidad nos permite saber si una variable le pertenecería a un sólo factor, o no:
## Horror Animated Fantasy Comedy Action SciFi
## 1.014860 1.085028 1.086306 1.127946 1.158629 1.177234
## War Thriller Documentary Romantic
## 1.440042 1.654373 2.130151 3.011648
Es decir, planteamos que esto existe:
modelCFA <- '
# measurement model
A_Solas =~ War + Action + SciFi + Documentary
En_Pareja =~ Horror + Thriller
En_Grupo =~ Fantasy + Animated + Comedy + Romantic
'
Podemos indicar cuáles son las ordinales:
ORDINALES=c("Horror","Thriller","Comedy","Romantic", "SciFi","War","Fantasy","Animated","Documentary","Action")
Luego, calculamos resultados del modelo:
cfaFIT=cfa(modelCFA,
data=theData,
ordered = ORDINALES)
Y vemos resultados:
| lhs | op | rhs | est | se | z | pvalue | ci.lower | ci.upper | std.lv | std.all | std.nox |
|---|---|---|---|---|---|---|---|---|---|---|---|
| A_Solas | =~ | War | 1.0000000 | 0.0000000 | NA | NA | 1.0000000 | 1.0000000 | 0.5470044 | 0.5470044 | 0.5470044 |
| A_Solas | =~ | Action | 1.2034374 | 0.0985248 | 12.214561 | 0e+00 | 1.0103323 | 1.3965425 | 0.6582856 | 0.6582856 | 0.6582856 |
| A_Solas | =~ | SciFi | 1.0626296 | 0.0954379 | 11.134253 | 0e+00 | 0.8755748 | 1.2496845 | 0.5812631 | 0.5812631 | 0.5812631 |
| A_Solas | =~ | Documentary | 0.3840705 | 0.0765992 | 5.014030 | 5e-07 | 0.2339389 | 0.5342021 | 0.2100883 | 0.2100883 | 0.2100883 |
| En_Pareja | =~ | Horror | 1.0000000 | 0.0000000 | NA | NA | 1.0000000 | 1.0000000 | 0.5538075 | 0.5538075 | 0.5538075 |
| En_Pareja | =~ | Thriller | 1.8197169 | 0.2756180 | 6.602314 | 0e+00 | 1.2795155 | 2.3599184 | 1.0077729 | 1.0077729 | 1.0077729 |
| En_Grupo | =~ | Fantasy | 1.0000000 | 0.0000000 | NA | NA | 1.0000000 | 1.0000000 | 0.9808019 | 0.9808019 | 0.9808019 |
| En_Grupo | =~ | Animated | 0.7675995 | 0.0494625 | 15.518817 | 0e+00 | 0.6706548 | 0.8645442 | 0.7528630 | 0.7528630 | 0.7528630 |
| En_Grupo | =~ | Comedy | 0.3632286 | 0.0414612 | 8.760678 | 0e+00 | 0.2819661 | 0.4444912 | 0.3562553 | 0.3562553 | 0.3562553 |
| En_Grupo | =~ | Romantic | 0.4726677 | 0.0364832 | 12.955771 | 0e+00 | 0.4011620 | 0.5441735 | 0.4635934 | 0.4635934 | 0.4635934 |
Graficamente:
O…
Qué tan buena es nuestra propuesta:
## $chisq
## [1] 346.596
##
## $df
## [1] 32
##
## $pvalue
## [1] 0
## [1] 0.9098774
## $rmsea.ci.lower
## [1] 0.09187005
##
## $rmsea
## [1] 0.1013551
##
## $rmsea.ci.upper
## [1] 0.1111277
Aquí queremos plantear una regresió combinando lo observado y la latente:
Este puede ser un modelo:
modelSEM <- '
# measurement model
A_Solas =~ War + Action + SciFi + Documentary
En_Pareja =~ Horror + Thriller
En_Grupo =~ Fantasy + Animated + Comedy + Romantic
# regressions
A_Solas ~ En_Pareja + En_Grupo
'
Si obtenemos la regresión:
semFIT <- sem(modelSEM,
data=theData,
ordered = ORDINALES)
Estos son los resultados:
Regresiones
| lh | s op | rh | s | est | se | z | pvalue | ci.lower c | i.upper | std.lv | std.all | std.nox |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 11 | A_Solas | ~ | En_Pareja | 0.4154710 | 0.0460657 | 9.019104 | 0.0000000 | 0.3251839 | 0.505758 | 0.4206386 | 0.4206386 | 0.4206386 |
| 12 | A_Solas | ~ | En_Grupo | -0.0292517 | 0.0238584 | -1.226053 | 0.2201787 | -0.0760134 | 0.017510 | -0.0524495 | -0.0524495 | -0.0524495 |
Cargas
| lhs | op | rhs | est | se | z | pvalue | ci.lower | ci.upper | std.lv | std.all | std.nox |
|---|---|---|---|---|---|---|---|---|---|---|---|
| A_Solas | =~ | War | 1.0000000 | 0.0000000 | NA | NA | 1.0000000 | 1.0000000 | 0.5470042 | 0.5470042 | 0.5470042 |
| A_Solas | =~ | Action | 1.2034377 | 0.0985249 | 12.214554 | 0e+00 | 1.0103325 | 1.3965430 | 0.6582856 | 0.6582856 | 0.6582856 |
| A_Solas | =~ | SciFi | 1.0626299 | 0.0954380 | 11.134248 | 0e+00 | 0.8755749 | 1.2496848 | 0.5812631 | 0.5812631 | 0.5812631 |
| A_Solas | =~ | Documentary | 0.3840704 | 0.0765992 | 5.014027 | 5e-07 | 0.2339387 | 0.5342020 | 0.2100881 | 0.2100881 | 0.2100881 |
| En_Pareja | =~ | Horror | 1.0000000 | 0.0000000 | NA | NA | 1.0000000 | 1.0000000 | 0.5538079 | 0.5538079 | 0.5538079 |
| En_Pareja | =~ | Thriller | 1.8197146 | 0.2756176 | 6.602317 | 0e+00 | 1.2795141 | 2.3599151 | 1.0077724 | 1.0077724 | 1.0077724 |
| En_Grupo | =~ | Fantasy | 1.0000000 | 0.0000000 | NA | NA | 1.0000000 | 1.0000000 | 0.9808015 | 0.9808015 | 0.9808015 |
| En_Grupo | =~ | Animated | 0.7675999 | 0.0494625 | 15.518820 | 0e+00 | 0.6706551 | 0.8645446 | 0.7528632 | 0.7528632 | 0.7528632 |
| En_Grupo | =~ | Comedy | 0.3632288 | 0.0414613 | 8.760679 | 0e+00 | 0.2819662 | 0.4444914 | 0.3562554 | 0.3562554 | 0.3562554 |
| En_Grupo | =~ | Romantic | 0.4726679 | 0.0364832 | 12.955773 | 0e+00 | 0.4011622 | 0.5441736 | 0.4635934 | 0.4635934 | 0.4635934 |
Visualmente:
Qué tan bueno es el modelo?
allFitSEM[c("chisq", "df", "pvalue")] # pvalue>0.05
## $chisq
## [1] 346.596
##
## $df
## [1] 32
##
## $pvalue
## [1] 0
## [1] 0.9098774
## $rmsea.ci.lower
## [1] 0.09187005
##
## $rmsea
## [1] 0.1013551
##
## $rmsea.ci.upper
## [1] 0.1111277